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Abstract 

We  present  a  new  robust  line  matching  algorithm  for  solving 
the  model-to-image  registration  problem.  Given  a  model 
consisting  of  3D  lines  and  a  cluttered  perspective  image  of 
this  model,  the  algorithm  simultaneously  estimates  the  pose 
of  the  model  and  the  correspondences  of  model  lines  to  im¬ 
age  lines.  The  algorithm  combines  softassign  for  determin¬ 
ing  correspondences  and  POSIT  for  determining  pose.  Inte¬ 
grating  these  algorithms  into  a  deterministic  annealing  pro¬ 
cedure  allows  the  correspondence  and  pose  to  evolve  from 
initially  uncertain  values  to  a  joint  local  optimum.  This  re¬ 
search  extends  to  line  features  the  SoftPOSIT  algorithm  pro¬ 
posed  recently  for  point  features.  Lines  detected  in  images 
are  typically  more  stable  than  points  and  are  less  likely  to 
be  produced  by  clutter  and  noise,  especially  in  man-made 
environments.  Experiments  on  synthetic  and  real  imagery 
with  high  levels  of  clutter,  occlusion,  and  noise  demonstrate 
the  robustness  of  the  algorithm. 

1.  Introduction 

This  paper  presents  an  algorithm  for  solving  the  model-to- 
image  registration  problem  using  line  features.  This  is  the 
task  of  determining  the  position  and  orientation  (the  pose ) 
of  a  three-dimensional  object  with  respect  to  a  camera  coor¬ 
dinate  system  given  a  model  of  the  object  consisting  of  3D 
reference  features  and  a  single  2D  image  of  these  features. 
We  assume  that  no  additional  information  is  available  with 
which  to  constrain  the  pose  of  the  object  or  to  constrain  the 
correspondence  of  model  to  image  features.  This  is  also 
known  as  the  simultaneous  pose  and  correspondence  prob¬ 
lem. 

Automatic  registration  of  3D  models  to  images  is  a  fun¬ 
damental  and  open  problem  in  computer  vision.  Applica¬ 
tions  include  object  recognition,  object  tracking,  site  in¬ 
spection  and  updating,  and  autonomous  navigation  when 
scene  models  are  available.  It  is  a  difficult  problem  be¬ 
cause  it  comprises  two  coupled  problems,  the  correspon- 

*Partial  support  of  NSF  awards  0086162,  9905844,  and  9987944  is 
gratefully  acknowledged. 


dence  problem  and  the  pose  problem,  each  easy  to  solve 
only  if  the  other  has  been  solved  first: 

1.  Solving  the  pose  problem  consists  of  finding  the  ro¬ 
tation  and  translation  of  the  object  with  respect  to  the 
camera  coordinate  system.  Given  matching  model  and 
image  features,  one  can  easily  determine  the  pose  that 
best  aligns  those  matches  [5], 

2.  Solving  the  correspondence  problem  consists  of  find¬ 
ing  matching  image  features  and  model  features.  If 
the  object  pose  is  known,  one  can  relatively  easily  de¬ 
termine  the  matching  features.  Projecting  the  model 
in  the  known  pose  into  the  original  image,  one  can 
identify  matches  according  to  the  model  features  that 
project  sufficiently  close  to  an  image  feature. 

The  classic  approach  to  solving  these  coupled  problems  is 
the  hypothesize-and-test  approach.  In  this  approach,  a  small 
set  of  image  feature  to  model  feature  correspondences  are 
first  hypothesized.  Based  on  these  correspondences,  the 
pose  of  the  object  is  computed.  Using  this  pose,  the  model 
points  are  back-projected  into  the  image.  If  the  original  and 
back-projected  images  are  sufficiently  similar,  then  the  pose 
is  accepted;  otherwise,  a  new  hypothesis  is  formed  and  this 
process  is  repeated.  Perhaps  the  best  known  example  of  this 
approach  is  the  RANSAC  algorithm  [6]  for  the  case  that  no 
information  is  available  to  constrain  the  correspondences  of 
model  to  image  points. 

Many  investigators  approximate  the  nonlinear  perspec¬ 
tive  projection  via  linear  affine  approximations.  This  is  ac¬ 
curate  when  the  relative  depth  of  object  features  is  small 
compared  to  the  distance  of  the  object  from  the  camera. 
Among  the  researchers  that  have  addressed  the  full  perspec¬ 
tive  problem,  Wunsch  and  Hirzinger  [111  formalize  the  ab¬ 
stract  problem  in  a  way  similar  to  the  approach  advocated 
here  as  the  optimization  of  an  objective  function  combining 
correspondence  and  pose  constraints.  However,  the  corre¬ 
spondence  constraints  are  not  represented  analytically.  The 
method  of  Beveridge  and  Riseman  [  1]  uses  a  random-start 
local  search  with  a  hybrid  pose  estimation  algorithm  em- 
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ploying  both  full-perspective  and  weak-perspective  camera 
models. 

David  et  al.  [4]  recently  proposed  the  SoftPOSIT  algo¬ 
rithm  for  simultaneous  pose  and  correspondence  determi¬ 
nation  for  the  case  of  a  3D  point  model  and  its  perspective 
image.  This  algorithm  integrates  an  iterative  pose  technique 
called  POSIT  (Pose  from  Orthography  and  Scaling  with  IT- 
erations)  [5],  and  an  iterative  correspondence  assignment 
technique  called  softassign  [9]  into  a  single  iteration  loop. 
A  global  objective  function  is  defined  that  captures  the  na¬ 
ture  of  the  problem  in  terms  of  both  pose  and  correspon¬ 
dence  and  combines  the  formalisms  of  both  iterative  tech¬ 
niques.  The  correspondence  and  the  pose  are  determined  si¬ 
multaneously  by  applying  a  deterministic  annealing  sched¬ 
ule  and  by  minimizing  this  global  objective  function  at  each 
iteration  step. 

We  extend  the  SoftPOSIT  algorithm  from  matching 
point  features  to  the  case  of  matching  line  features:  3D 
model  lines  are  matched  to  image  lines  in  2D  perspective 
images.  Lines  detected  in  images  are  typically  more  stable 
than  points  and  are  less  likely  to  be  produced  by  clutter  and 
noise,  especially  in  man-made  environments.  Also,  line  fea¬ 
tures  are  more  robust  to  partial  occlusion  of  the  model.  Our 
current  algorithm  uses  the  SoftPOSIT  algorithm  for  points 
to  determine  the  pose  and  correspondences  for  a  set  of  im¬ 
age  and  model  lines.  An  iteration  is  performed  where  at 
each  step  the  given  2D  to  3D  line  correspondence  problem 
is  mapped  to  a  new  2D  to  3D  point  correspondence  prob¬ 
lem  which  depends  on  the  current  estimate  of  the  camera 
pose.  SoftPOSIT  is  then  applied  to  improve  the  estimate  of 
the  camera  pose.  This  process  is  repeated  until  the  pose  and 
correspondences  converge. 

In  the  following  sections,  we  examine  each  step  of  the 
method.  We  first  review  the  SoftPOSIT  algorithm  for 
computing  pose  from  noncorresponding  2D  image  and  3D 
model  points.  We  then  describe  how  this  is  used  to  solve 
for  pose  when  only  line  correspondences  are  available.  Fi¬ 
nally,  some  experiments  with  simulated  and  real  images  are 
shown. 


camera  frame 


Figure  1 :  The  geometry  of  line  correspondences. 


Then,  the  perspective  image  of  a  3D  point  P  in  the  world 
frame  is  ( x ,  y )  where 


_  Ri  P+Ta,  _  R2p +tv 

-  R3P+T2  '  y  ~  R;;P  |  7  ,  • 


(1) 


We  will  also  need  to  use  the  weak  perspective  (also 
known  as  scaled  orthographic )  projection  model,  which 
makes  the  assumption  that  the  depth  of  an  object  is  small 
compared  to  the  distance  of  the  object  from  the  camera,  and 
that  visible  scene  points  are  close  to  the  optical  axis.  The 
weak  perspective  model  will  be  used  iteratively  in  the  pro¬ 
cess  of  computing  the  full  perspective  pose.  Under  the  weak 
perspective  assumption.  R3  -P  ps  0  since  R3  is  a  unit  vector 
in  the  world  coordinate  frame  that  is  parallel  to  the  camera’s 
optic  axis.  The  weak  perspective  image  of  a  3D  point  P  in 
the  world  frame  is  (xw,yw)  where 

xw  —  Ri  P +TX  ^  yW  —  ggP+S) .  (2) 


2.  Camera  Models 

Let  P  be  a  3D  point  in  a  world  coordinate  frame  with  origin 
O  (figure  1).  If  a  camera  placed  in  this  world  frame  is  used 
to  view  P,  then  the  coordinates  of  this  point  in  the  camera 
frame  may  be  written  as  RP  +  T.  Here,  R  is  a  3  x  3  rota¬ 
tion  matrix  representing  the  orientation  of  the  camera  frame 
with  respect  to  the  world  frame,  and  the  translation  T  is 
the  vector  from  the  camera  center  C  to  O,  expressed  in  the 
camera  frame.  Let  the  i*  row  of  R  be  denoted  by  R,  and 
let  the  translation  be  T  =  (Tx,Ty, TZ)J. 

We  assume  that  the  camera  is  calibrated,  so  that  pixel  co¬ 
ordinates  can  be  replaced  by  normalized  image  coordinates. 


3.  Pose  from  Point  Correspondences 

Our  new  line  matching  algorithm  builds  on  the  SoftPOSIT 
algorithm  [4],  which  itself  builds  on  the  POSIT  algorithm 
[5].  This  section  of  the  paper  gives  an  overview  of  these 
two  algorithms. 

3.1.  The  POSIT  Algorithm 

The  POSIT  algorithm  [5]  computes  an  object’s  pose  given 
a  set  of  corresponding  2D  image  and  3D  object  points.  The 
perspective  image  (atj,  yi)  of  a  3D  world  point  Pj  is  related 


to  the  image  (xf .  yf )  produced  by  a  scaled  orthographic 
camera  according  to 

Xf  =  WiXi, 

yf  =  WiUi. 

Equation  (3)  is  obtained  by  combining  equations  ( 1 )  and 
(2).  The  term  Wi  can  be  determined  only  if  the  camera  pose 
is  known: 

Wi  =  R3  •  OP i/Tz  +  1,  (4) 

where  OP,  is  the  vector  in  the  camera  coordinate  frame 
from  the  world  origin  to  P,.  When  the  depth  range  of  the 
object  along  the  optical  axis  is  small  compared  to  the  object 
distance,  R3  •  OP*  is  small  compared  to  Tz,  and  Wi  « 

1.  This  is  exactly  the  assumption  made  when  a  perspective 
camera  is  approximated  by  a  scaled  orthographic  camera. 

The  POSIT  algorithm  starts  by  assuming  that  the  per¬ 
spective  image  points  are  identical  to  the  scaled  ortho¬ 
graphic  image  points,  so  that  w.t  =  1  for  all  i.  Under  this 
assumption,  the  camera  pose  can  be  determined  by  solv¬ 
ing  a  simple  linear  system  of  equations.  This  solution  is 
only  approximate  since  Wi  =  1  is  only  approximate.  How¬ 
ever,  given  a  more  accurate  estimate  of  the  object’s  pose, 
the  accuracy  of  the  Wi  terms  can  be  improved  by  reestimat¬ 
ing  these  terms  using  equation  (4).  This  process  is  repeated 
until  the  pose  converges. 

3.2.  The  SoftPOSIT  Algorithm 

The  SoftPOSIT  algorithm  [4]  computes  camera  pose  given 
a  set  of  2D  image  and  3D  object  points,  where  the  corre¬ 
spondences  between  these  two  sets  are  not  know  a  priori. 
The  SoftPOSIT  algorithm  builds  on  the  POSIT  algorithm 
by  integrating  the  softassign  correspondence  assignment  al¬ 
gorithm  [8,  9].  For  M  image  points  and  N  object  points, 
the  correspondences  between  the  two  sets  is  given  by  an 
(M  + 1)  x  (N  + 1)  assignment  matrix  m  where  0  <  rriij  < 
1.  Intuitively,  the  value  of  rriij  (1  <  i  <  M,1  <  j  <  N) 
specifies  how  well  the  ith  image  point  matches  to  the  jth  ob¬ 
ject  point.  Initially,  all  rriij  have  approximately  the  same 
value,  indicating  that  correspondences  are  not  known.  Row 
M+l  and  column  N+l  of  m  are  the  slack  row  and  column, 
respectively.  The  slack  positions  in  m  receive  large  values 
when  an  image  point  does  not  match  any  model  point  or  a 
model  point  does  not  match  any  image  point. 

An  object’s  pose  can  be  parameterized  by  the  two  vectors 
Qi  =  (Ri ,TX)/TZ  and  Q2  =  (R2,T,,)/T,;  given  Qi  and 
Q2,  R  and  T  can  easily  be  determined.  The  homogeneous 
object  points  are  written  as  S;  =  (OPj,l).  Then,  given 
the  assignment  weights  rriij  between  the  image  and  object 
points,  the  error  function 
M  N 

E=Y.Y.  rriij  ((Qi  •  S j  -  WjXi  f  +  (Qa  •  Sj  ~  WjVi)2) 

i=l  3=1 

(5) 


gives  the  sum  of  the  squared  distances  between  scaled  or¬ 
thographic  image  points  (approximated  using  the  perspec¬ 
tive  image  points  as  in  equation  (3))  and  the  corresponding 
(weighted  by  the  rriij)  scaled  orthographic  images  of  the 
3D  object  points  (which  depend  on  the  object’s  estimated 
pose,  Qi  and  Q2)  [4],  The  solution  to  the  simultaneous 
pose  and  correspondence  problem  consists  of  the  rriij ,  Qi  ■ 
and  Q2  which  minimize  E.  The  function  E  is  minimized 
iteratively  as  follows: 

1.  Compute  the  correspondence  variables  rriij  assuming 
that  pose  is  fixed. 

2.  Compute  the  pose  vectors  Qi  and  Q2  assuming  that 
correspondences  are  fixed. 

3.  Compute  the  scaled  orthographic  corrections  Wi  using 
equation  (4)  and  the  new  pose  vectors. 

Steps  (1)  and  (2)  are  described  in  more  detail  below. 

Computing  Pose.  By  assuming  that  the  rriij  are  fixed, 
the  object  pose  which  minimizes  this  error  function  is  found 
by  solving  dE/d Qi  =  0  and  dE / Q2  =  0.  The  solution  is 

N  M  N 

Qi  =  (6) 


3- 1 

i=  1  3=1 

1 V 

M  N 

Q2  =  i^rri'jSjS])  1(^2^2mijWjyiSj):  (7) 

j= 1  i= 1  j= 1 

where  m'-  =  [4].  Computing  Qx  and  Q2  re¬ 

quires  the  inversion  of  the  4  x  4  matrix  Y^j= 1  tn^SjSj, 
which  is  inexpensive. 

Computing  Correspondences.  The  correspondence 
variables  rriij  are  optimized  assuming  the  pose  of  the  ob¬ 
ject  is  known  and  fixed.  The  goal  is  to  find  a  zero-one 
assignment  matrix,  m  =  {rriij},  tha.t  explicitly  specifies 
the  matches  between  a  set  of  M  image  points  and  a  set  of 
N  object  points,  and  that  minimizes  the  objective  function 
E.  The  assignment  matrix  must  satisfy  the  constraint  that 
each  image  point  match  at  most  one  object  point,  and  vice 
versa  (i.e.,  SfeTOj*  =  E^rnkj  =  1  for  all  i  and  j).  The 
objective  function  E  will  be  minimized  if  the  assignment 
matrix  matches  image  and  object  points  with  the  smallest 
distances  dij.  (Section  5  describes  how  these  distances  are 
computed.)  This  problem  can  be  solved  by  the  iterative  sof¬ 
tassign  technique  [8,  9].  The  iteration  for  the  assignment 
matrix  begins  with  a  matrix  m°  in  which  element  is 
initialized  to  exp(— (3(cljj  —  a)),  with  [i  very  small,  and 
with  all  elements  in  the  slack  row  and  slack  column  set  to  a 
small  constant.  The  parameter  a  determines  how  large  the 


distance  between  two  points  must  be  before  we  consider 
them  unmatchable.  The  continuous  assignment  matrix  con¬ 
verges  toward  a  discrete  matrix  due  to  two  mechanisms  that 
are  used  concurrently: 

1.  First,  a  technique  due  to  Sinkhorn  [10]  is  applied. 
When  each  row  and  column  of  a  square  correspon¬ 
dence  matrix  is  normalized  (several  times,  alternat- 
ingly)  by  the  sum  of  the  elements  of  that  row  or  column 
respectively,  the  resulting  matrix  has  positive  elements 
with  all  rows  and  columns  summing  to  one.  When  the 
matrix  is  not  square,  the  sums  of  the  rows  and  columns 
will  be  close  to,  but  not  exactly  equal  to  one. 

2.  The  term  fi  is  increased  as  the  iteration  proceeds.  As 
j3  increases  and  each  row  or  column  of  m  is  renor¬ 
malized,  the  terms  rriij  corresponding  to  the  smallest 
dfj  tend  to  converge  to  one,  while  the  other  terms  tend 
to  converge  to  zero.  This  is  a  deterministic  annealing 
process  [7]  known  as  softmax  [2],  This  is  a  desirable 
behavior,  since  it  leads  to  an  assignment  of  correspon¬ 
dences  that  satisfy  the  matching  constraints  and  whose 
sum  of  distances  in  minimized. 


where  we  have  assumed  that  n*  has  been  normalized  to  a 
unit  vector.  Under  the  approximate  pose  R  and  T,  the  im¬ 
age  points  corresponding  to  object  points  Pj  and  P'-  can  be 
approximated  as  the  images  of  S ij  and  SO : 


P  ij  = 


(Si. 


P  ij  = 


(9) 


4.2.  Computing  Pose  and  Correspondences 

The  pose  and  correspondence  algorithm  for  points  (Soft- 
POSIT)  involves  iteratively  refining  estimates  of  the  pose 
and  correspondences  for  the  given  2D  and  3D  point  sets. 
The  new  algorithm  for  lines  builds  on  this  approach  by  ad¬ 
ditionally  refining  in  the  iteration  a  set  of  estimated  images 
of  the  endpoints  of  the  3D  object  lines.  With  this  estimated 
image  point  set,  and  the  set  of  object  line  endpoints,  Soft- 
POSIT  is  used  on  each  iteration  to  compute  a  refined  esti¬ 
mate  of  the  object’s  pose. 

On  any  iteration  of  the  line  algorithm,  the  images  of  the 
3D  object  lines  endpoints  are  estimated  by  the  point  set 


This  combination  of  deterministic  annealing  and  Sinkhorn’s 
technique  in  an  iteration  loop  was  called  softassign  by  Gold 
and  Rangarajan  [8,  9].  The  matrix  m  resulting  from  an  iter¬ 
ation  loop  that  comprises  these  two  substeps  is  the  assign¬ 
ment  that  minimizes  the  global  objective  function  E.  These 
two  substeps  are  interleaved  in  an  iteration  loop  along  with 
the  substeps  that  optimize  the  pose. 

4.  Pose  from  Unknown  Line 
Correspondences 

4.1.  Geometry  of  Line  Correspondences 

Each  3D  line  in  an  object  is  represented  by  the  two  3D  end¬ 
points  of  that  line  whose  coordinates  are  expressed  in  the 
world  frame:  Lj  =  {P„P'}.  See  figure  1.  A  line  li  in 
the  image  is  defined  by  the  two  2D  endpoints,  p j  and  p'.  of 
the  line  and  is  represented  by  the  plane  of  sight  that  passes 
through  li  and  the  camera  center  C  .  The  normal  to  this 
plane  is  n,  =  (p?:,l)  x  (p'.  1),  and  3D  points  P  in  the 
camera  frame  lying  on  this  plane  satisfy  nJP  =  0. 

Let  us  assume  that  image  line  li  corresponds  to  object 
line  Lj.  If  the  object  has  pose  given  by  R  and  T,  then 

S j  =  RPj  +  T  and  S'  =  RP'  +  T  lie  on  the  plane  of 

sight  through  When  R  and  T  are  erroneous  and  only 
approximate  the  true  pose,  the  closest  points  to  Sj  and  S'- 
which  satisfy  this  incidence  constraint  are  the  orthogonal 
projections  of  Sj  and  S'-  onto  the  plane  of  sight  of  If 

S ij  =  FCP j  +  T  —  (-RPj  +  T)  •  n,;. 

S G  =  RP'  +  T  -  (RP'  +  T)  ■  m, 


Rimg(R,T)  —  1  <  *  <  1  <  j  <  N }  ,  (10) 

which  is  computed  using  equations  (8)  and  (9).  For  every 
3D  endpoint  of  an  object  line,  there  are  M  possible  images 
of  that  point,  one  for  each  image  line.  This  set  of  2 MN 
image  points  depends  on  the  current  estimate  of  the  object’s 
pose,  and  thus  changes  from  iteration  to  iteration.  The  ob¬ 
ject  points  used  by  SoftPOSIT  are  fixed  and  is  the  set  of  2 N 
object  line  endpoints:  R0bj  =  {Pj ,  P'- ,  1  <  j  <  iV} . 

We  now  have  a  set  of  2 MN  image  points  and  a  set  of 
2 N  object  points.  To  use  SoftPOSIT,  an  assignment  ma¬ 
trix  between  the  two  sets  is  needed.  The  initial  assignment 
matrix  for  point  sets  Rms,  and  PQbj  is  computed  from  the 
distances  between  the  image  and  model  lines  as  discussed 
in  section  5.  If  ij  and  L  j  have  distance  dij,  then  all  points 
in  Rmg  and  R0bj  derived  from  ij  and  Lj  will  also  have  dis¬ 
tance  dij  .  Although  the  size  of  this  assignment  matrix  is 
(2 MN  + 1)  x  (2 N  + 1),  only  4 MN  of  it  values  are  nonzero 
(not  counting  the  slack  row  and  column).  Thus,  with  a  care¬ 
ful  implementation,  the  current  algorithm  for  line  features 
will  have  the  same  run-time  complexity  as  the  SoftPOSIT 
algorithm  for  point  features,  which  was  empirically  deter¬ 
mined  to  be  O (M2N)  [4], 

The  following  is  high-level  pseudocode  for  the  line- 
based  SoftPOSIT  algorithm. 

1.  Initialize:  R,  T,  /?,  R0bj  . 

2.  Project  the  model  lines  into  the  image  using  the  current 
pose  estimate.  Compute  the  distances  dij  between  the 
true  image  lines  and  the  projected  model  lines. 


3.  Initialize  the  assignment  matrix  as  ntfj  = 
exp (— /3(d?-  —  a))  and  then  compute  m  by  nor¬ 
malizing  with  Sinkhorn’s  algorithm. 

4.  Compute  P;mg(i?,  T)  (equation  ( 10)). 

5.  Solve  for  Qi  and  Q2  (equations  (6)  and  (7))  using  m 
and  the  point  sets  P0 bj  and  Pj*  ,  and  then  compute  R 
and  T  from  Qi  and  Q2. 

6.  Stop  if  R  and  T  have  converged;  otherwise,  set  [j  = 
/^update  •  /?  and  go  to  step  (2). 

The  algorithm  described  above  performs  a  deterministic  an¬ 
nealing  search  starting  from  an  initial  guess  for  the  object’s 
pose.  However,  it  provides  only  a  local  optimum.  A  com¬ 
mon  way  of  searching  for  a  global  optimum,  and  the  one 
taken  here,  is  to  run  the  algorithm  starting  from  a  number 
of  different  initial  guesses,  and  keep  the  first  solution  that 
meets  a  specified  termination  criteria.  Our  initial  guesses 
range  over  [— 7t,  7t]  for  the  three  Euler  angles,  and  over  a  3D 
space  of  translations  containing  the  true  translation.  We  use 
a  random  number  generator  to  generate  these  initial  guesses. 
See  [4]  for  details. 

5.  Distance  Measures 

The  sizes  of  the  regions  of  convergence  to  the  true  pose 
is  affected  by  the  distance  measure  employed  in  the  cor¬ 
respondence  optimization  phase  of  the  algorithm.  The  line- 
based  SoftPOSIT  algorithm  applies  SoftPOSIT  to  point  fea¬ 
tures  where  the  distances  associated  with  these  point  fea¬ 
tures  are  calculated  from  the  line  features.  The  two  main 
distinguishing  features  between  the  different  distance  mea¬ 
sures  are  ( 1 )  whether  distances  are  measured  in  3-space  or 
in  the  image  plane,  and  (2)  whether  lines  are  treated  as  hav¬ 
ing  finite  or  infinite  length.  The  different  distance  measures 
that  we  experimented  with  are  described  below. 

The  first  distance  measure  that  we  tried  measures  dis¬ 
tances  in  the  image  plane,  but  implicitly  assumes  that  both 
image  and  projected  model  lines  have  infinite  length.  This 
metric  applies  a  type  of  Hough  transform  to  all  lines  (im¬ 
age  and  projected  model)  and  then  measures  the  distance 
in  this  transformed  space.  The  transform  that  is  applied 
maps  an  infinite  line  l  to  the  2D  point  h/Jl)  on  that  line 
which  is  closest  to  some  fixed  reference  point  tv  The 
distance  between  an  image  line  lj  and  the  projection  lj  of 
object  line  Lj  with  respect  to  reference  point  Tk  is  then 
dfj  =  || hk(h)  —  hk(lj) ||-  Because  this  Hough  line  distance 
is  biased  with  respect  to  the  reference  point  ij. ,  for  each 
pair  of  image  and  projected  object  line,  we  sum  the  dis¬ 
tances  computed  using  five  different  reference  points,  one 
at  each  corner  of  the  image  and  one  at  the  image  center: 
dij  =  Sfc=i  \\dk(h)  —  hkilj) ||- 


The  second  distance  measure  that  we  tried  measures  dis¬ 
tances  in  the  image  plane  between  finite  length  line  seg¬ 
ments.  The  distance  between  image  line  lj  and  the  projec¬ 
tion  lj  of  object  line  Lj  is  dij  =  A0(lj,lj)  +  pd(lj,lj ) 
where  A 9(lj,lj)  measures  the  difference  in  the  orienta¬ 
tion  of  the  lines,  d(lj,lj)  measures  the  difference  in  the 
location  of  the  lines,  and  p  is  a  scale  factor  that  deter¬ 
mines  the  relative  importance  of  orientation  and  location. 
A 0(li,lj)  =  1  —  |cos(Z^/y)|  where  Zlilj  denotes  the  angle 
between  the  lines.  Because  lines  detected  in  an  image  are 
usually  fragmented,  corresponding  only  to  pieces  of  object 
lines,  d(lj7  lj)  is  the  sum  of  the  distance  of  each  endpoint  of 
lj  to  the  closest  point  on  the  finite  line  segment  lj.  So,  for 
a  correct  pose,  d(lj,lj)  =  0  even  when  lj  is  only  a  partial 
detection  of  Lj .  This  distance  measure  has  produced  better 
performance  than  the  previous  measure,  resulting  in  larger 
regions  of  convergence  and  fewer  number  of  iterations  to 
converge. 

6.  Experiments 

6.1.  Simulated  Images 

Our  initial  evaluation  of  the  algorithm  is  with  simulated 
data.  Random  3D  line  models  are  generated  by  selecting 
a  number  of  random  points  in  the  unit  sphere  and  then  con¬ 
necting  each  of  these  points  to  a  small  number  of  the  closest 
remaining  points.  An  image  of  the  model  is  generated  by 
the  following  procedure; 

1.  Projection:  Project  all  3D  model  lines  into  the  image 
plane. 

2.  Noise:  Perturb  with  normally  distributed  noise  the  lo¬ 
cations  of  the  endpoints  of  each  line. 

3.  Occlusion:  Delete  randomly  selected  image  lines. 

4.  Missing  ends:  Chop  off  a  small  random  length  of  the 
end  of  each  line.  This  step  simulates  the  difficulty  of 
detecting  lines  all  the  way  into  junctions. 

5.  Clutter:  Add  a  number  of  lines  of  random  length  to 
random  locations  in  the  image.  The  clutter  lines  will 
not  intersect  any  other  line. 

Figure  2  shows  our  algorithm  determining  the  pose  and  cor¬ 
respondence  of  a  random  3D  model  with  30  lines,  from  a 
simulated  image  with  40%  occlusion  of  the  model,  40% 
of  the  image  lines  being  clutter,  and  normally  distributed 
noise  with  standard  deviation  0.15%  of  the  image  dimen¬ 
sion  (about  1  pixel  for  a  640  x  480  image).  As  seen  in  this 
figure,  the  initial  and  final  projections  of  the  model  differ 
greatly,  and  so  it  would  be  difficult  for  a  person  to  deter¬ 
mine  the  correspondence  of  image  lines  to  model  lines  from 
the  initial  projection  of  the  model  into  the  image.  Our  algo¬ 
rithm,  however,  is  often  successful  at  finding  the  true  pose 


Figure  2:  Example  application  of  our  algorithm  to  a  cluttered  image.  The  eight  frames  on  the  left  show  the  estimated  pose 
at  initialization  (upper  left)  and  at  steps  1,  3,  5,  12,  20,  27,  and  35  of  the  iteration.  The  thin  lines  are  the  image  lines  and  the 
bold  lines  are  the  projection  of  the  model  at  the  current  step  of  the  iteration.  The  correct  pose  has  been  found  by  iteration 
step  35.  The  right  side  of  this  figure  shows  the  evolution  of  the  assignment  matrix  at  the  corresponding  steps  of  the  iteration. 
Because  of  the  way  the  simulated  data  was  generated,  the  correct  assignments  lie  near  the  main  diagonal  of  the  assignment 
matrix.  Image  lines  are  indexed  along  the  vertical  axis,  and  model  lines  along  the  horizontal  axis.  Brighter  pixels  in  these 
figures  correspond  to  greater  weight  in  the  assignment  matrix.  The  correct  assignments  have  been  found  by  iteration  step  35. 
Unmatched  image  and  object  points  are  apparent  by  the  large  values  in  the  last  row  and  column  of  the  assignment  matrix. 


from  such  initial  guesses.  Although  we  have  not  yet  done  a 
quantitative  evaluation  of  the  algorithm,  anecdotal  evidence 
suggests  that  under  50%  occlusion  and  50%  clutter,  the  al¬ 
gorithm  finds  the  true  pose  in  about  50%  of  trials  when  the 
initial  guess  for  the  pose  differs  from  the  true  pose  by  no 
more  than  about  30°  of  rotation  about  each  of  the  x,  y,  and 
z  axis.  (The  initial  rotation  of  the  model  shown  in  figure  2 
differs  from  that  of  the  true  pose  by  28°  about  each  of  the 
coordinate  axis.) 

6.2.  Real  Images 

Figure  3  shows  the  results  of  applying  our  algorithm  to  the 
problem  of  a  robotic  vehicle  using  imagery  and  a  3D  CAD 
model  of  a  building  to  navigate  through  the  building.  A 
Canny  edge  detector  is  first  applied  to  an  image  to  produce 
a  binary  edge  image.  This  is  followed  by  a  Hough  trans¬ 
form  and  edge  tracking  to  generate  a  list  of  straight  lines 
present  in  the  image.  This  process  generates  many  more 
lines  than  are  needed  to  determine  a  model’s  pose,  so  only 
a  small  subset  are  used  by  the  algorithm  in  computing  pose 
and  correspondence.  Also,  the  CAD  model  of  the  building 
is  culled  to  include  only  those  3D  lines  near  the  camera’s 
estimated  position. 

7.  Conclusions 

The  simultaneous  determination  of  model  pose  and  model- 
to-image  feature  correspondence  is  very  difficult  in  the  pres¬ 
ence  of  model  occlusion  and  image  clutter.  Experiments 
with  the  line-based  SoftPOSIT  algorithm  show  that  it  is  ca¬ 
pable  of  quickly  solving  high-clutter,  high-occlusion  prob¬ 
lems,  even  when  the  initial  guess  for  the  model  pose  is 
far  from  the  true  pose.  The  algorithm  solves  problems  for 
which  a  person  viewing  the  image  and  initial  model  projec¬ 
tion  have  no  idea  how  to  improve  the  model’s  pose  or  how 
to  assign  feature  correspondences. 

We  are  interested  in  determining  the  complexity  of  the 
algorithm  when  no  information  is  available  to  constrain  the 
model’s  pose,  except  for  the  fact  that  the  model  is  visible 
in  the  image.  This  will  allow  us  to  compare  the  efficiency 
of  line-based  SoftPOSIT  to  other  algorithms.  The  key  pa¬ 
rameter  that  needs  to  be  determined  is  the  number  of  initial 
guesses  required  to  find  a  good  pose,  as  a  function  of  clut¬ 
ter,  occlusion,  and  noise.  We  expect  that  the  line-based  al¬ 
gorithm  will  require  many  fewer  initial  guesses  than  point- 
based  SoftPOSIT  algorithm. 
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